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1.0  SUMMARY 


Scientists  have  dreamed  of  information  systems  with  cognitive  human-like  skills  for 
years.  However,  constrained  by  the  device  characteristics  and  rapidly  increasing  design 
complexity  under  the  traditional  processing  technology,  little  progress  has  been  made  in 
hardware  implementation.  The  recently  popularized  memristor  offers  a  potential  break¬ 
through  for  neuromorphic  computing  because  of  its  unique  properties  including  nonvola- 
tilely,  extremely  high  fabrication  density,  and  sensitivity  to  historic  voltage/current  be¬ 
havior. 

In  this  project,  we  first  investigated  the  memristor-based  synapse  design  and  the  corre¬ 
sponding  training  scheme.  Then,  the  design  optimization  and  its  implementation  in  multi - 
synapse  systems  were  analyzed  too.  With  the  aid  of  sharing  training  circuit  and  self¬ 
training  mode,  the  performance  and  energy  can  be  significantly  improved.  At  last,  a  case 
study  of  an  arithmetic  logic  unit  (ALU)  was  designed  to  demonstrate  the  hardware  im¬ 
plementation  of  reconfigurable  system  built  based  on  memristor  synapses.  All  the  circuit 
design,  simulation,  layout,  and  functionality  verifications  have  been  completed. 

2.0  INTRODUCTION 

Neuromorphic  computing  architectures  imitate  natural  neurobiological  processes  by 
mimicking  the  highly  parallelized  computing  architecture  of  the  biological  brain.  To  real¬ 
ize  such  a  novel  architecture  in  hardware,  at  least  two  conditions  need  to  be  satisfied  at 
the  technology  level:  high  integration  density  and  ability  to  record  the  history  of  electric 
signals.  Neuromorphic  computing  architectures  that  have  a  large  volume  of  memory  and 
are  adaptable  to  their  environment  have  demonstrated  great  potential  towards  the  devel¬ 
opment  of  high  performance  parallel  computing  systems  [1].  Most  of  the  research  activi¬ 
ties  have  focused  on  software  or  the  system  level  using  conventional  Von  Neumann 
computer  architectures  [2-3].  Developing  a  neuromorphic  architecture  at  the  chip  level  by 
mimicking  biological  systems  is  another  important  direction.  However,  a  biological  scale 
hardware  implementation  based  on  traditional  CMOS  devices  requires  extremely  high 
design  complexity  and  cost,  which  is  impractical. 

The  existence  of  memristor  was  predicted  as  early  as  in  1971  [4],  but  the  first  physical 
realization  that  adopted  that  term  was  first  reported  thirty  years  later  by  Hewlett-Packard 
Laboratories  (HP  Labs)  with  their  Ti02  thin-film  device  [5],  It  soon  became  clear  that 
many  more  materials  with  memristive  properties  had  been  reported  since  the  1960’s.  Yet 
while  these  devices  had  some  common  behaviors,  they  each  operated  according  to  differ¬ 
ent  physical  phenomena.  The  unique  properties  of  the  memristor  make  it  promising  in 
neuromorphic  computing  systems.  First,  prototyped  memristor  devices  have  demonstrat¬ 
ed  scalability  at  sub- 10  nm  scales.  Accordingly,  the  memristor  memories  can  achieve  a 
high  integration  density  of  100Gbits/cm2,  several  orders  higher  than  the  popular  flash 
memory  technologies  [4-5].  Second,  the  memristor  device  has  an  intrinsic  and  remarka¬ 
ble  feature  called  “pinched  hysteresis  loop,”  which  means  it  can  “remember”  the  total 
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electric  charge  flowing  through  it  [4,6].  Third,  memristance  remains  unchanged  when 
power  is  turned  off.  Consequently,  memristor-based  memory  combined  with  the  high- 
integration  capability  and  the  pinched  hysteresis  characteristics  can  be  applied  to  a  mas- 
sively-parallel,  large-scale  neuromorphic  computing  processor  architecture. 

Many  memristor-based  circuit  designs  have  been  explored,  such  as  crossbar  nonvolatile 
memory  [8]  and  FPGA  [9].  Strukov  et  al.  integrated  digital  memory,  programmable 
Boolean  logic  circuit,  and  neuron  networks  within  a  3D  hybrid  CMOS/memristor  struc¬ 
ture.  Rajendran  et  al.  proposed  a  memristor-based  programmable  threshold  logic  array 
[10]  and  used  it  in  a  synapse-neuron  structure  [11],  However,  training  circuits  and  train¬ 
ing  schemes  for  a  memristor-based  reconfigurable  architecture  design  have  not  been  fully 
explored  yet. 

Therefore,  in  this  project,  we  investigated  memristor-based  reconfigurable  design  tech¬ 
niques.  The  structure  is  built  upon  single  memristor-based  synapse  and  the  corresponding 
training  circuit  design.  An  8-bit  ALU  design  built  on  synapse  structures  was  used  as  a 
case  study  to  demonstrate  its  potential  in  developing  a  neuromorphic  computing  proces¬ 
sor  architecture.  The  ALU  design  composed  of  -100  synapses  can  be  adaptively  trained 
to  realize  addition,  subtraction,  and  binary  counting  functionalities.  The  circuit  design, 
simulation,  layout,  and  functionality  verifications  have  been  completed  in  the  project. 

3.0  METHODS,  ASSUMPTIONS,  AND  PROCEDURES 

3.1.  Memristor  Theory 

Nearly  forty  years  ago,  Professor  Chua  predicted  the  existence  of  the  memristor  -  the 
fourth  fundamental  circuit  element,  to  complete  the  set  of  passive  devices  that  previously 
included  only  the  resistor,  capacitor,  and  inductor  [4],  The  memristor  uniquely  defines 
the  relationship  between  the  magnetic  flux  ( cp )  and  the  electric  charge  (q)  passing 
through  the  device, 

dcp  =  M  ■  d q.  (1) 

Considering  that  magnetic  flux  and  electric  charge  are  the  integrals  of  voltage  ( V )  and 
current  (/)  over  time,  respectively,  the  definition  of  memristor  can  be  generalized  as 

V  =  M(o>,  /)  ■  I 

dco/dt  =  /(a), /)’  ^ 

where  co  is  a  state  variable  and  M (a),  /)  represents  the  instantaneous  memristance,  which 
varies  over  time.  For  a  “ideal”  memristor,  neither  M(oi, /)  nor /(&>,/)  can  be  expressed 
only  as  a  function  of  current  I. 
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Based  on  this  mathematical  description,  these  devices  remained  primarily  intellectual  cu¬ 
riosities  until  HP  Lab  first  used  these  relationships  to  described  the  memristive  switching 
effect  created  by  moving  the  doping  front  along  Ti02  thin-film  device  [4],  Soon,  more 
memristive  systems  were  identified  according  to  their  behavior,  to  include  spintronics  [5- 
6],  polymeric  thin  film  [12-13],  MgO  based  magnetic  tunnel  junctions  (MTJ)  [14-15], 
and  AlAs/GaAs/AlAs  quantum-well  diodes  [16], 

An  intrinsic  and  remarkable  feature  of  the  memristor  is  called  “pinched  hysteresis  loop,” 
that  is,  memristors  can  “remember”  the  total  electric  charge  flowing  through  them  by 
changing  their  resistances  (memristance)  [17],  The  unique  properties  create  great  oppor¬ 
tunities  in  future  system  design.  For  instance,  HP  researchers  proposed  a  memristor- 
based  architecture,  which  could  change  the  standard  paradigm  of  computing  by  enabling 
calculations  to  be  performed  in  the  chips  where  data  is  stored,  rather  than  in  a  specialized 
central  processing  unit  [18],  Moreover,  the  applications  of  this  memristive  behavior  in 
electronic  neural  network  have  been  extensively  studied  [19-20]. 

Figure  1(a)  illustrates  the  conceptual  view  of  Pt/TiCL/Pt  structure:  two  orthogonal  metal 
wires  (Pt)  serve  as  the  top  and  bottom  electrodes  with  a  thick  titanium  dioxide  film  sand¬ 
wiched  in  between.  A  perfect  TiCL  structure  in  its  natural  state  is  as  an  insulator.  Howev¬ 
er,  the  conductivity  of  oxygen-deficient  titanium  dioxide  (TiCL-x)  is  much  higher.  By 
moving  the  doping  front  under  proper  electrical  excitations,  intermediate  memristive 
states  can  be  achieved.  We  use  RH  and  RL  to  denote  the  total  resistance  when  a  TiC>2 
memristor  is  fully  undoped  (maximum  high  resistance)  and  doped  (minimum  low  re¬ 
sistance),  respectively.  The  overall  memristance  is  then  the  equivalent  of  two  serially- 
connected  resistors,  as  shown  in  Figure  1(b).  That  is 


M  (a)  =  a  ■  Rl  +  (1  -  a)  ■  Rh, 


(3) 


where  a  (0  <  a  <  1)  is  the  relative  doping  front  position,  which  is  the  ratio  of  the  doping 
front  position  over  the  total  thickness  of  the  TiC>2  device. 


BE 


(a)  TiC>2  Memristor  Stucture  (b)  Equivalent  Circuit 

Figure  1.  TiC>2  thin-film  memristor 
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3.2.  The  Memristor-based  Logic  Design 


3.2.1  The  Principle  of  Memristor-based  Synapse.  Rather  than  using  a  memristor  crossbar 
array  in  a  neuromorphic  reconfigurable  architecture,  we  proposed  a  memristor-based 
synapse  design  to  mimic  the  biological  structure.  Figure  2  depicts  the  conceptual  scheme, 
which  simply  consists  of  an  NMOS  transistor  ( Q )  and  a  memristor.  When  the  input  Vin  is 
low,  the  transistor  Q  is  turned  off.  Thus,  the  output  Vout  is  connected  to  ground  through 
the  memristor.  Conversely,  when  Vin  is  high,  turning  Q  on,  the  memristance  M  and  the 
equivalent  transistor  resistance  ( Rq )  together  determine  Vout, 

Vout  =f(Vin  ■  M),  (4) 

where  Vout  is  weighted  by  the  memristance.  This  variable  weight  can  be  treated  like  a 
synapse. 


Vin 
D - 


Vout  =  Vin  •  M 


Figure  2:  A  memristor-based  synapse  design 

Note  that  the  response  of  the  synapse  design  was  dependent  on  the  equivalent  resistance 
(effectively,  the  size)  of  the  Q  transistor  (Rq)-  A  larger  Q  would  offer  a  wider  range  of 
Vout  with  poorer  linearity.  However,  for  a  large  Q,  the  increased  range  of  Vout  by  further 
size  increase  would  be  marginal.  The  simulation  results  showing  this  can  be  found  in 
Section  4.1.1. 

To  improve  design  stability,  a  buffer  can  be  added  at  output  of  the  synapse  to  increase  the 
voltage  swing.  Furthermore,  some  circuit  optimization  techniques,  such  as  asymmetry 
gates  in  other  blocks,  can  be  used  to  minimize  the  overall  synapse-based  system. 

3.2.2  Synapse  Training  Circuit.  Being  self-adaptive  to  the  environment  is  one  of  the  most 
important  properties  of  a  biological  synapse.  To  accomplish  a  similar  functionality,  a 
training  block  is  needed  in  the  memristor-based  synapse  to  adjust  its  memristance. 

The  training  circuit  compares  the  generated  result  Vout  and  the  expected  result  Dtrain  to 
decide  if  training  is  needed  or  not.  The  corresponding  Vtop  and  Vbot  are  generated  and 
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applied  to  the  two  terminals  of  a  memristor.  Figure  3  is  the  symbol  of  the  synapse  train¬ 
ing  circuit. 


O  VtoP 

o  vbot 


Figure  3:  Synapse  training  circuit  symbol 

Figure  4  shows  the  diagram  of  a  training  circuit  for  a  one  synapse  design  based  on  logic 
analysis  and  simplification.  It  included  two  major  components:  a  training  controller  and 
a  write  driver.  By  comparing  the  current  synapse  output  Vout  and  the  expected  output 
Dtrain,  the  training  controller  generated  the  control  signals.  The  write  driver  used  these 
signals  to  control  two  pairs  of  NMOS  and  PMOS  switches  and  supply  the  training  voltage 
pair  Vtop  and  Vbot.  The  training  pair  was  then  applied  to  the  two  terminals  of  the  memris¬ 
tor  in  the  synapse  design. 


Training  Controller 


Vout 
D — 


Latch 


' train 


INVt 


IN  V2 


D  Q 
bClk 
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OR 

rO0- 

inv3 


NAND 


Write  Driver 

Vdd 
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HL  Q 


wb 


H| 

GND 

Vdd 


Wc 


d! 


wd 


■<  Q 

GNO? 


'top 


M 


Figure  4.  Synapse  training  circuit  diagram 

Table  1  summarizes  the  operation  conditions  of  the  proposed  training  circuit  design.  The 
training  circuit  can  work  under  two  modes  determined  by  the  training  enable  signal  E. 

Table  1.  Training  Circuit  Operation  Conditions 


E 

Vout 

Dtrain 

Vtop 

Vbot 

V  mem 

Status 

0 

X 

X 

Floating 

0 

X 

Operating 

1 

1/0 

1/0 

0 

0 

ov 

No  training 

1 

1 

0 

1 

0 

1.8V 

Rh  to  Rl 

1 

0 

1 

0 

1 

-1.8V 

Rl  to  Rh 
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*  ‘O’-  logic  low;  ‘1’-  logic  high,  and  ‘X’  -  unknown  or  don’t  care. 

•  Operating  mode :  When  E  =  0,  the  synapse  operated  in  the  regular  (read)  mode;  and 
the  training  circuit  was  disabled. 

•  Training  mode :  The  training  circuit  was  enabled  when  E  =  1.  By  comparing  the  cur¬ 
rent  synapse  output  Vout  and  the  expected  Dtrain,  the  training  circuit  generated  Vtop 
and  the  Vbot  applied  to  the  two  terminals  of  the  memristor  to  update  or  keep  its 
memristance.  We  define  Vmem  =  Vtop  -  Vbot. 

Figure  5  depicts  the  proposed  memristor-based  synapse  integrated  with  training  circuit. 
An  extra  NMOS  transistor  Q2  was  inserted  in  the  synapse  to  isolate  training  operation 
from  other  voltage  sources.  When  E  =  1,  Q2  was  turned  off  so  that  the  two  terminals  of 
memristor  were  controlled  only  by  the  training  circuit  and  not  by  Vin. 


Figure  5.  Synapse  together  with  training  circuit 

3.2.3  Multi-synapse  Training  Scheme.  Most  of  the  neuron  systems  are  constructed  by  multi¬ 
ple  synapses.  In  this  section,  we  discuss  the  corresponding  training  scheme  for  a  2-synapse 
neuron,  of  which  Figure  6  is  an  example.  Here,  Aj  and  A2  were  two  synapse  inputs  re¬ 
ceived  from  other  neurons.  Mi  and  M2  are  memristor-based  weights  for  the  two  synapses 
Si  and  S2.  N  denoted  a  neuron  with  output  Vout.  The  value  of  Vout  depended  upon  the 
functionality  of  N  as  well  as  Voutl  and  Vout2  from  the  two  synapses.  With  the  different 
combinations  of  Mi  and  M2,  the  two-input  neuron  obtained  different  functionalities. 


Ai 


A2 


Figure  6.  Two-input  neuron  structure 
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To  save  design  cost,  memristances  of  the  2-synapse  can  be  trained  separately  and  share 
one  training  circuit.  Figure  7  shows  a  training  sharing  distribution  circuit,  which  generat¬ 
ed  training  signals  to  control  M/  and  M2.  The  training  sharing  circuit  operations  under 
different  conditions  are  shown  in  Table  2. 

The  two  synapse  inputs  Ai  and  A2  can  be  used  to  determine  which  memristor,  Mj  or  M2, 
was  in  training.  Table  3  lists  the  required  A;  and  A2  when  the  logic  functionality  of  N  was 
one  of  the  following:  OR/NOR,  XOR/XNOR,  AND/NAND. 


Vtopl 

Vtop2 


Vbotl 

Vbot2 


Figure  7.  Training  sharing  distribution  circuit 


Table  2.  Training  Sharing  Circuit  Operation 


Status 

V topi 

Vbotl 

Vtop  2 

Vbotl 

Training  Mj 

Vtop 

V bot 

Floating 

0 

Training  M2 

Floating 

0 

vtoP 

Vbot 

Table  3.  Synapse  Input  Pairs  for  Different  Logic  Values 


Functionality  of  N 

Training  Mi 

Training  M2 

OR/NOR 

Ai=  1,A2  =0 

A]  =  0,  A2  =  1 

XOR/XNOR 

A]  =  1,  A2  =0 

A]  =  0,  A2  =  1 

AND/NAND 

Aj  =  1,  A2  =1 

Aj  =  1,  A2  =  1 
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3.2.4  Training  Block  Sharing  Scheme.  Since  the  two  memristors  could  be  trained  separately, 
it  was  possible  to  share  a  training  circuit  between  two  synapses.  By  doing  this,  we 
achieved  the  same  functionality  but  with  reduced  design  cost.  The  design  diagram  of  the 
2-synapse  shared  training  circuit  and  the  sharing  distribution  circuit  are  depicted  in  Fig¬ 
ure  8. 


Figure  8.  2-synapse  shared  training  circuit  and  sharing  distribution  circuit 

3.2.5  Two-level  OR  Neuron  Training  Strategy.  Expanding  the  synapse  design  to  multiple 
levels  can  provide  a  more  powerful  reconfigurable  design.  For  example,  Figure  9  shows  a 
2-level  OR  neuron  circuit  built  with  the  previous  2-synapse  structure.  Such  a  design  can 
achieve  16  possible  logics  at  output. 


SI  (Ml) 


A2  D - 1  S2  (M2) 


V0Ut1  Jzj0R>-| 

|  Vout2  | 


S5  (M5) 


S4(M4)  |Vout4_T~^ 


Vout5  |  _  Vout 

ZJ0R> - D 

I  Vout6  I 


0 

1 

Al,  A2,  A3,  A4 

4 

A1+A2,  A1+A3,  A1+A4,  A2+A3,  A2+A4,  A3+A4 

6 

A1+A2+A3,  A1+A2+A4,  A1+A3+A4,  A2+A3+A4 

4 

A1+A2+A3+A4 

1 

Vout=A1  •  Ml  M5+A2- M2-  M5+A3-  M3-  M6+ A4-  M4-  M6 


Figure  9.  Two-level  OR  neuron  circuit  and  possible  logic  values  table 

Detailed  analysis  showed  that  the  memristances  of  M5  and  M6  did  not  introduce  more 
logic  functionality.  Training  only  Ml  through  M4  and  keeping  M5  and  M6  all  high 
achieved  all  16  possible  logic  functions.  Similar  to  2-synapse  training,  we  trained  Ml  - 
M4  separately  by  activating  one  synapse  branch  at  a  time.  To  do  this,  we  applied  ‘  1’  to 
the  input  of  the  activated  branch  and  ‘O’  to  the  inputs  of  all  the  other  branches.  For  exam¬ 
ple,  to  train  Ml,  we  set  A1  =  1  and  A2  =  A3  =A4  =  0. 

Figure  10  illustrates  the  training  steps.  We  used  the  same  training  case,  where  Ml 
through  M4  were  set  high  at  the  beginning,  trained  to  low,  and  then  finally  trained  back  to 
high  again. 
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Figure  10.  Two-synapse  shared  training  strategy 

3.2.6  Ten-synapse  Circuit.  Figure  1 1  gives  an  example  of  a  3-level  design  with  ten  synapses. 
Each  sub-block  was  composed  of  two  synapses  with  an  OR  gate  as  the  neuron  function. 
Ml  -  M10  were  used  to  denote  the  weights  of  the  ten  synapses  in  the  circuit.  The  func¬ 
tionality  of  this  structure  can  be  summarized  as 

Vout  =  Al  ■  XI  +  A2  ■  X2.  (5) 


Figure  11. 10-synapse  OR  neuron  circuit 

XI  and  X2  are  simplified  combinations  of  Ml  -  M10.  Theoretically,  this  circuit  had  the 
same  functionality  as  the  2-synapse  structure  OR  neuron.  However,  the  redundant  design 
was  more  robust  with  a  higher  fault  tolerance.  Even  if  some  devices  were  damaged,  the 
structure  could  be  self-healed  and  obtain  the  required  logic.  For  example,  when  M5  and 
M6  were  open  and  appeared  as  high  memristance  due  to  process  damage,  Void  could  still 
execute  the  four  logic  combinations. 

In  this  design,  we  kept  M5  -  M10  as  high  all  the  time  and  trained  Ml  -  M4  only.  Apply¬ 
ing  ‘1’  to  A1  and  ‘0’  to  A2  trained  Ml  and  M3  simultaneously.  We  then  applied  ‘0’  to  A1 
and  ‘1’  to  A2  to  train  M2  and  M4  at  the  same  time.  Figure  12  illustrates  this  training 
scheme. 
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Training 


Training 


A1 
A2  . 
D_train 
E 

Vout' 


Ml 

M2 

M3 

M4 


Logic  Test 

Ml  is  high,  M2  is  high. 


Logic  Test 

Ml  is  low,  M2  is  low. 


\ — r~\ 


A1  +  A  2] 


J  \J~\ 

xxxxoi 

Don’t  care 

~X 


\ 


■X. 


\ 


«  r 


Vou^  =  0 


J  W  V 


xxxxxk. 

Don’t  care 

X- 


.X 


Logic  Test 

Ml  is  high,  M2  is  high. 


XX _ TX 


/V  out  =  A1  +  A2 


Figure  12.  Timing  diagram  of  10-synapse  circuit  training 


3.3.  Case  Study  -  Synapse-based  ALU  Design 

We  designed  an  8-bit  arbitrary  logic  unit  (ALU)  by  using  memristor-based  synapses.  The 
ALU  can  be  used  for  addition,  subtraction,  and  counting.  In  the  project,  we  completed  the 
circuit  design,  simulation,  layout,  and  functionality  verification.  The  design  details  are 
explained  in  the  following  section. 

3.3.1  1-bit  Adder-Subtractor  Block.  Figure  13  shows  the  schematic  of  the  1-bit  adder- 
subtractor  block  built  by  synapses. 


Figure  13.  Schematic  of  1-bit  adder-subtractor  block 
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The  full  adder  had  three  inputs  (IN0,  IN,,  and  C0)  and  two  outputs  ( Sum  and  Carry )  with  the 
following  relations: 


Sum  =  IN0  ©  IN1  ©  C0 

Carry  =  IN0  ■  IN±  +  C0  ■  (IN0  ©  INJ  (6) 

This  full  adder  design  could  be  used  as  an  unsigned  subtractor  by  inverting  the  subtra¬ 
hend  and  setting  C0=  1 , 


Sum  =  IN0  ©  INt  ©  C0 

Carry  =  IN0-W;+C0- (. IN0  ©  M 1  ’ 

In  the  1-bit  adder-subtractor  design,  three  synapse  blocks  bridged  the  input  signals  and 
activation  functions.  Based  on  the  required  functions,  i.e.,  adding  or  subtracting,  the 
weights  in  these  synapses  could  be  trained  accordingly.  For  details  of  the  synapse  design, 
refer  to  Section  3.2. 

3.3.2  Binary  Counter.  An  m-bit  binary  counter  used  n  digital  bits  to  represent  2m  numbers.  It 
incremented  by  1  for  every  clock  cycle  and  started  over  from  0  if  all  the  digital  bits  were 
l’s.  We  assumed  the  outputs  of  an  m-bit  binary  counter  at  nth  clock  cycle  and  the  (n  +  l)th 
clock  cycle  were  Qn  =  and  Qn+1  =  Qo+1,  respectively.  Then  we 

had 

Ro+1  =  Qo 

q?+1  =  Qi  ©  qf-i  (i  =  1  ...  m  -  1) 

We  used  m-pieces  of  the  above  adder-subtractor  blocks  to  build  a  binary  counter.  For  ex¬ 
ample,  Figure  14  shows  a  4-bit  counter  based  on  the  adder-subtractor  blocks. 
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Figure  14.  Schematic  of  4-bit  binary  counter  built  with  adder-subtractor  blocks 

3.3.3  Decade  Counter.  A  decade  counter  has  four  output  pins  to  represent  decimal  numbers  0 
-  9.  The  corresponding  binary  outputs  are  from  ‘0000’  to  ‘1001’.  On  the  rising  edge  of 
each  clock  cycle,  the  output  increased  1  or  reset  back  to  ‘0000’  after  it  reached  ‘1001’. 
By  properly  modifying  the  adder-subtractor  block  design,  we  also  build  up  a  decade 
counter. 

We  assumed  that  the  outputs  of  a  decade  counter  at  nth  clock  cycle  and  the  (n  +  l)th  clock 
cycle  were  Qn  =  q ”  q”  Ri  <Jo  and  Qn+1  =  q”+1  q”+1  9i+1  9o+1>  respectively.  Based  on 
the  Kamough  map,  we  had 


q"+1  =  qS, 

(9a) 

q!+1  =  q”qS  +  qf-qS-q! . 

(9b) 

q"+1  =  q"q!+  q"qS  +  qf-qi-q", 

(9c) 

q"+1  =  q"q5  +  q"q;q”. 

(9d) 

Eq.  (9a)  could  be  realized  with  a  1-bit  adder  by  setting  IN0  =  0,  IN1 
So  <7o+1  =  Sum  =  0  ©  1  ©  qfi  =  q” . 

=  1,  and  C0  =  q%. 
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Eq.  (9b)  could  be  transformed  to  q”+1  =  g”  -q^  ■  (l  +  g”)  +  q”  -go  'Qi  =  Qi'Qo  +  ' 

(go  ©  q” )•  This  form  had  the  similar  shape  as  the  Carry  output  in  Eq.  (7)  and  hence  we 
could  obtain  Eq.  (9b)  by  setting  IN0  =  q}},  IN1  =  q™,  and  C0  =  q” . 

By  using  DeMorgan’s  Law,  Eq.  (9c)  could  be  changed  to  q” +1  =  q”  '(.Ri  +  Qo )  + 

q” -qi  -qo  =  Q2  ®(Qi  ’Qo) ■  By  slightly  modifying  the  adder-subtractor  block  as  shown  in 
Figure  15,  one  could  realize  Eq.  (9c). 


Synapse 


Carry 


Sum 


Figure  15.  Schematic  of  the  modified  adder-subtractor  block  to  realized  Eq.  (9c) 

Similarly,  we  could  add  one  inverter  and  two  synapse  blocks  in  the  adder-subtractor 
block  to  realize  Eq.  (9d),  as  seen  in  Figure  16. 
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Figure  16.  Schematic  of  the  modified  adder-subtractor  block  to  realized  Eq.  (9d) 

3.3.4  4-bit  ALU  as  Adder,  Subtractor,  and  Decade  Counter.  Figure  17  shows  the  schematic 
of  a  4-bit  ALU,  which  could  add,  subtract,  and  count  decimal  numbers. 

3.3.5  8-bit  ALU  as  Adder,  Subtractor,  and  Decade  Counter.  We  built  an  8-bit  ALU  unit  by 
using  the  basic  block  shown  in  Figure  18.  It  could  conduct  8-bit  addition,  subtraction,  or 
binary  counting.  The  design  contained  ~100  synapses.  In  development,  we  adopted  a 
CMOS-only  design  in  order  (a)  to  demonstrate  the  design  concept  and  (b)  to  avoid  the 
risks  due  to  immature  memristor  fabrication  process.  Taiwan  Semiconductor  Manufactur¬ 
ing  Co.  (TSMC)  0.18pm  technology  was  used  for  cost  reduction.  The  schematic  and  lay¬ 
out  are  shown  in  Figure  18  and  Figure  19,  respectively. 
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Figure  19.  Layout  of  a  8-bit  ALU  as  adder,  subtractor,  and  binary  counter 
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4.0 


RESULTS  AND  DISCUSSION 


In  the  section,  we  will  show  the  corresponding  simulation  results  to  validate  the  effec¬ 
tiveness  of  the  proposed  memristor-based  synapsed  based  design.  Also,  the  functionality 
of  the  ALU  case  study  as  an  adder,  subtractor,  and  binary  counter  will  be  verified. 

4.1.  Simulation  Verification  of  Memristor-based  Logic  Design 

4.1.1  The  Memristor-based  Synapse.  Figure  20  shows  the  relation  of  the  input  and  output 
signals  of  a  memristor-based  synapse  proposed  in  Figure  2.  Here,  Vin  was  the  input  of  the 
synapse  design;  and  Vout  represented  its  output  signal.  When  Vin  was  low,  Vout  was 
connected  to  ground  through  the  memristor  and  hence  was  low.  When  Vin  rose  high, 
Vout  was  at  an  intermediate  value,  which  was  determined  by  the  memristance  M  together 
with  the  equivalent  resistance  of  Q  (Rq). 


Output  response  when  M  is  high 

>  2.0  *t 

<L>  1  r 

GO  X*5 

ro 

%  10 
>  0.5 

0.0 

- /Vin  (V) 

2.0 

>  1.5 

QJ 

i-o 

i« 

>  oo  1 

— /Vout(V) 

1 

c 

1  5  10  15  20 

Time  (ns) 

Figure  20.  Output  response  of  a  synapse  when  memristor  is  at  high  resistance  state 

Figure  21  shows  the  response  of  Vout  for  changing  memristance  from  1KO  to  1 6KQ. 
Here,  CMOS  devices  used  TSMC  0.18pm  technology.  In  general,  Vout  increased  as  the 
memristance  becomes  higher.  The  response  of  the  synapse  design  relied  on  the  equivalent 
resistance  of  the  transistor  Q  (Rq),  or  the  size  of  Q.  This  was  demonstrated  in  Figure  21  by 
sweeping  the  width  of  Q  from  220  nm  to  4.4  pm  in  220  nm  steps.  The  simulation  showed 
that  a  larger  Q  can  result  in  a  wider  range  of  Vout  but  with  poorer  linearity.  However,  for 
a  large  Q,  the  enhancement  of  Vout  by  further  increasing  its  size  was  marginal. 
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Figure  21.  Output  voltage  of  a  memristor-based  synapse  vs.  memristance 

To  improve  design  stability,  a  buffer  can  be  added  at  the  output  of  the  synapse  to  increase 
the  voltage  swing.  Furthermore,  some  circuit  optimization  techniques,  such  as  an  asym¬ 
metry  gate  in  other  blocks,  could  be  used  to  minimize  the  overall  synapse-based  system. 

4.1.2  Synapse  Training  Circuit.  The  timing  diagram  of  training  circuit  is  demonstrated  Figure 
22.  Before  a  training  procedure  starts,  a  sensing  step  was  required  to  detect  the  current 
Vout  to  be  compared  with  Dtrain.  In  the  sensing  phase,  accordingly,  training  enable  signal 
E  was  set  to  low  for  a  very  short  period  of  time,  e.g.,  4.5  ns,  at  the  beginning  of  training. 
At  the  same  time,  Vout  was  sent  to  Latch,  whose  output  Vout '  remained  constant  during 
one  training  period,  as  shown  in  Figure  4.  In  the  training  phase,  E  was  set  back  to  high  for 
a  much  longer  time,  i.e.,  5 1  ms,  to  change  the  memristance  if  needed. 
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Figure  22.  The  timing  diagram  of  training  circuit 
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We  tested  the  training  procedure  using  the  TiC>2  memristor  model  [4],  The  training  circuit 
was  designed  with  TSMC  0.18pm  technology  with  VDD  =  1.8  V.  Changing  the 
memristance  from  RH  to  Rl  or  verse  vice  required  about  5 1  ms.  The  simulation  result  is 
shown  in  Figure  23.  Here,  the  memristance  was  initialized  to  M  =  16  KQ.  Over  the  first  5 1 
ms,  the  memristor  was  trained  to  1  KQ  by  setting  Dtrain  to  low.  Then  at  t  =  5 1  ms,  we  set 
Dtrain  to  high  and  trained  the  memristance  back  to  Rh  in  the  following  51  ms. 
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Figure  23.  The  simulation  result  of  memristor  training 

4.1.3  Asymmetric  Gate  Design.  Since  the  size  of  Q1  affects  the  range  of  Vout,  the  asymmetry 
gate  design  can  be  adopted  to  minimize  the  layout  area  of  synapse  design  instead  of  add¬ 
ing  a  buffer  or  having  a  giant  Q1  in  the  synapse.  More  specifically,  we  tuned  the  P-type/N- 
type  transistor  (P/N)  ratio  of  INV1  of  the  training  circuit  in  Figure  4.  Table  4  summarized 
the  required  sizes  of  INV1  and  Q1  under  different  combinations  of  successful  training  pa¬ 
rameters.  The  result  shows  that  the  asymmetric  design  with  P/N  ratio  =  0.5  can  obtain  the 
smallest  area.  The  last  option  was  used  in  the  following  synapse  analysis. 

Table  4.  Sizing  of  INV1  and  Q1 


P/N  Ratio 

PMOS/NMOS  in  IN VI 

Ql 

2 

720  nm /  360  nm 

440  nm/  220  nm 

1 8  x  220  nm 

16  x  220  nm 

1 

360  nm/  360  nm 

220  nm/  220  nm 

12  x  220  nm 

1 1  x  220  nm 

0.5 

360  nm/  360  nm 

220  nm/  440  nm 

9  x  220  nm 

9  x  220  nm 
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4.1.4  2-Synapse  Design  with  an  OR  Logic  Neuron.  To  demonstrate  the  functionality  of  a  cir¬ 
cuit  composed  of  multiple  synapses,  we  used  a  2-synampse  circuit  with  an  OR  logic  neu¬ 
ron.  The  functionality  of  this  structure  could  be  summarized  as 

Vout  =  Al  •  Ml  +  A2  ■  M2.  (10) 


Based  on  Ml  and  M2  combinations,  the  structure  could  be  configured  into  four  possible 
logics:  0,  Al,  A2,  and  A1+A2.  Eq.  (5)  shows  that  Ml  and  M2  are  independent  to  each 
other.  We  could  also  train  each  path  separately.  By  applying  ‘1’  to  Al  and  ‘O’  to  A2  and 
comparing  Vout  and  Dtrain,  we  can  train  the  memristor  Ml.  Similarly,  the  memristor  M2 
can  be  trained  independently  by  applying  ‘O’  to  Al  and  ‘  V  to  A2. 


Ml  is  low,  M2  is  high. 
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Figure  24.  A  design  with  two  synapses  and  its  four  possible  outputs 


We  could  use  two  one-cell  training  block  shown  in  Figure  4  to  train  Ml  and  M2  individu¬ 
ally.  Such  a  2-synapse  circuit  with  training  blocks  is  shown  in  Figure  25.  Here,  we  add 
Q2  and  Q4  to  help  control  the  synapse  training.  When  training  enable  E  was  low,  both  Q2 
and  Q4  were  turned  on  to  generate  Voutl  and  Vout2,  respectively.  When  E  was  high  and 
the  circuit  was  in  training  mode,  either  Q2  or  Q4  was  turned  off  to  train  Ml  or  M2,  respec¬ 
tively.  The  two  PMOS  transistors  Q2  and  Q4  were  used  to  control  the  access  of  two  dif¬ 
ferent  memristor  rail  paths,  i.e.,  synapse  1  and  synapse  2,  when  either  reading  or  training. 


The  simulation  result  is  also  shown  in  Figure  25.  This  case  study  started  with  both  Ml 
and  M2  as  high.  First,  we  separately  trained  them  to  low  and  then  changed  them  back  to 
high  again.  To  verify  the  training  results,  a  logic  test  was  conducted  before  and  after  a 
training.  There  are  three  logic  tests  in  Figure  25(a).  The  logic  test  (nanoseconds)  was 
much  faster  than  the  training  process  (milliseconds).  Hence,  we  highlighted  the  inset  of 
the  logic  tests  at  0  s,  200  ms,  and  400  ms,  in  Figure  25(b-d),  respectively. 
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Figure  25.  The  2-synapse  training  circuit  an  simulation  results 

At  the  beginning  {t  =  0  s),  Ml  =  M2  =  1  and  Vout  =  A1+  A2.  After  training  (t  =  200  ms), 
Ml  and  M2  were  low,  Ml  =  M2  =  0,  and  Vout  remained  at  0  without  respect  to  any  ap¬ 
plied  input.  By  t  =  400  ms,  both  memristances  were  trained  back  to  high,  and  Vout  =  A1  + 
A2.  The  timing  diagram  in  Figure  26  graphically  depicts  our  training  strategy. 


A1 

A2 

D_train 

E 


Vout 


Logic  Test  Training  Logic  Test  Training  Logic  Test 


Figure  26.  The  timing  diagram  of  a  2-synapse  training  procedure 
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4.1.5  Self-training  mode.  To  improve  the  training  time  and  reduce  power  consumption,  we 
introduced  the  concept  of  self-training  to  our  design.  Rather  than  using  a  fixed  long  train¬ 
ing  period,  i.e.,  51  ms,  the  self-training  mode  automatically  stopped  programming  the 
memristor  whenever  Vout  and  Dtrain  became  the  same. 

The  proposed  training  circuit  supported  a  self-training  mode  by  dividing  a  long  training 
period  into  multiple  shorter  periods  and  detecting  Vout  between  the  periods.  The  pro¬ 
gramming  period  needed  to  be  carefully  selected.  If  it  was  too  short,  the  delay  and  energy 
overhead  induced  by  Vout  detection  might  overwhelm  the  benefit  of  self-training.  On  the 
contrary,  a  long  programming  period  would  not  show  significant  improvement. 

The  simulation  result  in  Figure  27  shows  the  memristance  change  over  increasing  pro¬ 
gramming  periods  from  (5.1  to  51)  ms  in  10ms  steps.  Obviously,  the  self-training  mode 
could  significantly  reduce  the  required  training  time.  In  theory,  the  proposed  training  cir¬ 
cuit  could  train  the  memristance  to  any  value  between  RH  and  RL.  The  training  time  in 
practice  would  be  determined  by  the  specific  application  and  neuron  functionality. 


Time  (ms) 

Figure  27.  Self-training  simulation 

4.1.6  Power  Analysis.  The  expected  power  consumption  of  reading  and  training  operations  are 
presented  in  Table  1.  The  energy  value  was  obtained  for  a  set  read  time  and  write  time  of 
4.5  ns  and  51  ms,  respectively.  Compared  to  the  separated  training  circuit  for  each 
memristor,  the  shared  scheme  could  reduce  26%  of  training  circuit  transistor  count.  More 
saving  in  cost  and  area  can  be  obtained  when  utilizing  this  training  sharing  distribution 
scheme  to  multi-synapse  structure  with  more  inputs. 
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Table  5.  Synapse  Power  Consumption  Analysis 


Operation 

Power 

Energy 

Read 

Rl 

1.04  mW 

4.68  pJ 

Rh 

1 13.4  uW 

0.51  pJ 

Training 

From  Rh  to  Rl 

216.7  uW 

1 1 . 1  u  J 

From  Rl  to  RH 

234  uW 

11.9  uJ 

4.2.  The  Functionality  Verification  of  8-bit  ALU  Design 

In  this  section,  we  will  demonstrate  addition,  subtraction,  and  counting  functionalities  of 
the  proposed  synapse-based  ALU  design.  Though  our  simulations  proved  the  functionali¬ 
ty  of  the  ALU  design,  it  is  impractical  to  graphically  present  all  possible  input  combina¬ 
tions.  Only  a  few  representative  input  combinations  are  presented. 

4.2.1  Addition.  Figure  28  is  the  simulation  result  of  the  addition  function.  In  the  first  testing 
period,  we  set  the  addend  and  summand  to  ‘10011001’  and  ‘00010010’,  respectively.  Af¬ 
ter  16  ns,  we  change  the  summand  to  ‘00111011’.  The  first  expected  final  result 
‘10101011’  was  achieved  after  4.46  ns,  and  the  second  expected  final  result  ‘11010100’ 
was  achieved  after  22.9  ns. 


T  ransi  ent  Response 


Figure  28.  Simulation  results  of  the  8-bit  adder 
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4.2.2  Subtraction.  Figure  29  is  the  simulation  result  of  subtraction  function.  In  the  first  testing 
period,  we  set  the  minuend  and  subtrahend  as  ‘10011001’  and  ‘00010010’,  respectively. 
After  16  ns,  we  changed  the  minuend  and  summand  to  ‘11011001’  and  ‘00111000’,  re¬ 
spectively.  The  first  expected  final  result  ‘01011110’  was  achieved  after  5.12  ns  and  the 
second  expected  final  result  ‘10100001’  was  achieved  after  27.15  ns. 


Transient  Response 


Figure  29.  Simulation  results  of  the  8-bit  subtractor 


4.2.3  Binary  Counting.  Figure  30  presents  the  simulation  results  for  an  8  ns  clock  period 
applied  to  the  8-bit  binary  counter. 


Transient  Response 


Figure  30.  Simulation  results  of  the  8-bit  binary  counter 
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5.0 


CONCLUSIONS 


In  this  project,  we  proposed  a  novel  synapse  design  based  on  the  emerging  memristor 
technology.  The  corresponding  logic  design  to  enable  the  adaptive  logic  functionality  in¬ 
cluding  the  synapse  design  scheme,  training  circuitry,  multi-level  synapses,  and  training 
strategy  was  investigated.  The  proposed  synapse  design  can  be  used  to  construct  recon- 
figurable  systems.  A  two  level  synapse  design  was  used  to  illustrate  the  design  and  train¬ 
ing  concept.  Then  an  8 -bit  ALU  capable  of  realizing  addition,  subtraction,  and  binary 
counting  functionality  was  designed  and  verified  using  TSMC  0.18pm  technology.  Lay¬ 
out  for  fabrication  was  completed  for  this  design. 

In  the  next  stage  of  our  project,  we  plan  to  extend  our  research  of  the  memristor-based 
reconfigurable  system  design  into  a  broader  context,  including  developing  the  design  au¬ 
tomation  design  flow  and  the  scalable  design  methodology  of  large-scale  synapse  cir¬ 
cuits. 
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LIST  OF  ABBREVIATIONS  AND  ACRONYMS 


ALU 

P/N 

TSMC 


arbitrary  logic  unit 

P-type/N-type  transistor 

Taiwan  Semiconductor  Manufacturing  Co. 
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